Analysis of Non-Markovian Temporal Networks

An educational tutorial using the free python module pyTempNets

Ingo Scholtes
Chair of Systems Design
ETH Zürich

May 2015

In this tutorial, I will introduce some basic concepts of temporal network analysis, which have been introduced in two recent publications:

  • I Scholtes, N Wider, R Pfitzner, A Garas, CJ Tessone, F Schweitzer: Causality-driven slow-down and speed-up of diffusion in non-Markovian temporal neworks, Nature Communications, 5, Sept. 2014
  • R Pfitzner, I Scholtes, A Garas, CJ Tessone, F Schweitzer: Betweenness preference: Quantifying correlations in the topological dynamics of temporal networks, Phys Rev Lett, 110(19), 198701, May 2013

The main point of these works is that we highlight the importance of non-Markovian characteristics in time-stamped relational data. In a nutshell, we show that the ordering of links in time-stamped networks matters, and we introduce methods that allow to study the influence of order correlations in temporal networks. This new perspective on time-stamped network data provides interesting new opportunities not only to better understand dynamical processes on temporal networks, but also for the definition of novel measures for centrality in temporal networks, or the development of new community detection algorithms that take into account bot the topological and temporal dimension of complex systems.

In the following, I illustrate some basic concepts of non-Markovian characteristics in temporal networks. In particular, I will showcase the use of pyTempNet, a free python module which simplies the analysis and visualization of time-stamped relational data, as well as the simulation of dynamical processes on top of temporal networks. All of the methods and concepts introduced in the two publications listed above are fully implemented in pyTempNet thus making it particularly easy to apply them to your data.

Installation of pyTempNets is easy. You will need a working installation of python3, as well as the module igraph properly set up. You are then ready to install pyTempNets directly from gitHub by typing:

pip install git+git://github.com/IngoScholtes/pyTempNets.git

Let us start with a new ipython notebook and let us import some basic modules required in the following tutorial. We first import the basic pyTempNet module as follows:

In [1]:
import pyTempNet as tn

Since time-aggregated networks are internally represented as igraph objects (which we can plot and analyze using standard igraph functions) we will also need to include igraph.

In [2]:
import igraph

Finally, we will want to plot some figures and do some basic calculations using numpy and matplotlib.

In [3]:
import numpy as np
import matplotlib.pyplot as plt

You won't have to care too much about the following imports and definitions, as they will simply allow us to directly embed igraph plots and videos into the output of an ipython notebook.

In [4]:
from IPython.display import *

from IPython.display import HTML
import base64

def showVideo(filename):
    VIDEO_TAG = """<video controls>
 <source src="data:video/x-m4v;base64,{0}" type="video/mp4">
 Your browser does not support the video tag.
</video>"""
    bytes = open(filename, "rb").read()
    return HTML(VIDEO_TAG.format(base64.b64encode(bytes).decode('utf-8')))

Two simple examples

Let us introduce some basic concepts using two simple examples for temporal networks. We first create an empty temporal network object, which we can then use to add time-stamped links one by one. In each call, we pass the source and target node, as well as an integer time stamp indicating when the link was present.

In [5]:
t1 = tn.TemporalNetwork()
t1.addEdge('a', 'c', 1)
t1.addEdge('c', 'e', 2)

t1.addEdge('b', 'c', 3)
t1.addEdge('c', 'd', 4)

t1.addEdge('b', 'c', 5)
t1.addEdge('c', 'e', 6)

t1.addEdge('a', 'c', 7)
t1.addEdge('c', 'd', 8)

A nice way to visualize such networks are so-called time-unfolded notations, in which we unfold time into a spatial (vertical) dimension. pyTempNet allows you to directly generate tikz code for such a representation. We just call the following function:

In [33]:
t1.exportTikzUnfolded('output/t1.tex')

This will produce a LaTeX file which we can simply compile to obtain a PDF figure. Here, I have manually compiled the file and converted the resulting PDF figure to a PNG, which I can display in ipython as follows:

In [34]:
display(Image(filename='output/t1.png'))

In this representation, each of the five nodes is represented by multiple temporal copies. Two subsequent temporal copies are connected whenever a link existed at the respective time stamp. This representation makes it easy to study so-called time-respecting paths.

Apart from time-unfolded representations, pyTempNet can also easily be used to produce videos of temporal networks. For this we can simply export a sequence of PNG movie frames using the following built-in function:

In [6]:
t1.exportMovieFrames('frames/video')

This will generate a sequence of numbered frames starting with the prefix video within the subdirectory frames. We can then convert the resulting frames to a mp4 video using - for instance - the tool imagemagick. For this, make sure to install imagemagick for your platform and run the following command from the terminal:

> convert -delay 50 frames/video* output/video.mp4

We can now show the resulting video of the temporal network shown above. This is temporal network visualization made easy, just click the play button.

In [7]:
showVideo('output/video.mp4')
Out[7]:

Let us now study the extraction of so-called time-respecting paths in temporal networks. This crucial concept is the temporal equivalent to paths in static network. Different from static networks, links on a time-respecting path have to respect causality, i.e. in order for a path (u,v,t_1) -> (v,w,t_2) of two time-stamped links to connect nodes u and w, the first link has to be present before the second link, i.e. t_1<t_2. pyTempNet can automatically extract time-respecting paths of length two for you. Sometimes, you may also want to include a limit in the waiting time for time-respecting paths. I.e. rather than simply requiring t_1<t_2, you may want to impose that t_2-t_1 <= delta for some maximum waiting time delta.

The pyTempnet function extractTwoPaths can be used to extract time-respecting paths of length two. If no delta is specified, delta = 1 is assumed, i.e. only consecutive links will be considered to contribute to time-respecting paths.

In [37]:
t1.extractTwoPaths()

Depending on your data set, this function may require a few seconds (if you have several hundred thousand or even Millions of time-stamped links). Once it has been completed, we can start analyzing causality structures and non-Markovian characteristics in the temporal network. If you omit the explicit call above, the call will automatically be made whenever time-respecting paths are first needed.

Based on the statistics of time-respecting paths, we can calculate betweenness preference, a measure that captures to what extent nodes mediate time-respecting paths between particular pairs of their neighbours in the static network. Being an information-theoretic measure, it can be interpreted in terms of the number of bits of information that are lost, when we aggregate all time-stamped links around a node (thus obtaining a time-aggregated network which misses the information on the ordering of links). For details on the definition and interpretation of the measure, I have to refer to the publications mentioned above.

With pyTempNet we can calculate the betweenness preference of a node using the following function. In this particular case, the betweenness preference of node c should be zero, as there is no preference for node c to mediate time-respecting paths between any particular pair of nodes (see also temporal network above).

In [38]:
print('I^b(S;D) = ', tn.Measures.BetweennessPreference(t1,'c'))
I^b(S;D) =  0.0

We can now calculate different time-aggregated representations of our temporal network. The simplest and most commonly used one is the plain time-aggregated network, in which all time-stamps are discarded, while link weights indicate the number or frequency of interactions between nodes.

We can obtain an igraph object representing the (first-order) aggregate network as follows. This igraph object can be plotted using basic igraph functions.

In [39]:
# We can compute and plot the first-order aggregate network
g1 = t1.igraphFirstOrder()

visual_style = {}
visual_style["bbox"] = (600, 400)
visual_style["margin"] = 60
visual_style["vertex_size"] = 80
visual_style["vertex_label_size"] = 24
visual_style["vertex_color"] = "lightblue"
visual_style["edge_curved"] = 0.2
visual_style["edge_width"] = 1
visual_style["edge_arrow_size"] = 2

visual_style["layout"] = g1.layout_auto()
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_label"] = g1.es["weight"]

igraph.plot(g1, 'output/t1_G1.png', **visual_style)
display(Image(filename='output/t1_G1.png'))

In one of the papers listed above, we introduced so-called higher-order time-aggregated networks which not only capture the frequency of interactions, but which also capture the ordering of time-stamped links. The idea is that each link in the time-stamped network is represented by a node in the second-order network, while each time-respecting path of length two is represented by a link. Again, details on this abstraction can be found in the papers above.

Here, we can compute (and plot) the second-order network for the temporal network above as follows:

In [40]:
g2 = t1.igraphSecondOrder()

visual_style["layout"] = g2.layout_auto()
visual_style["vertex_label"] = g2.vs["name"]
visual_style["edge_label"] = g2.es["weight"]
igraph.plot(g2, 'output/t1_G2.png', **visual_style)
display(Image(filename='output/t1_G2.png'))

In the example above, we see that a total of four time-respecting paths of length two (represented by the four links) exist in the temporal network, each of them occurring exactly one time (thus the link weights of one).

Let us now see what happens if we simply flip the order of two time-stamped links in the temporal network. We do this by defining the following new temporal network (which is identical to the previous one except for one reordering).

In [41]:
t2 = tn.TemporalNetwork()
t2.addEdge('a', 'c', 1)
t2.addEdge('c', 'e', 2)

t2.addEdge('b', 'c', 3)
t2.addEdge('c', 'd', 4)

t2.addEdge('b', 'c', 5)
t2.addEdge('c', 'd', 6)

t2.addEdge('a', 'c', 7)
t2.addEdge('c', 'e', 8)

Again, we can produce a tikz figure showing the time-unfolded representation of this temporal network (which I again manually compiled and converted to a PNG file).

In [42]:
t2.exportTikzUnfolded('t2.tex')
display(Image(filename='output/t2.png'))

We now again extract all time-respecting paths of length two and then visualize the first-order time-aggregated network. Not surprisingly, this is exactly the same as before, since we only changed the ordering of time-stamped links.

In [43]:
t2.extractTwoPaths()

g1 = t2.igraphFirstOrder()

visual_style["layout"] = g1.layout_auto()
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_label"] = g1.es["weight"]
igraph.plot(g1, 't2_G1.png', **visual_style)
display(Image(filename='t2_G1.png'))

As we will see now, there is an important change in terms of causality structures. Now, node c has a preference to mediate time-respecting paths between the pairs of nodes a and e as well as b and d respectively. Furthermore, knowing the "source" of a time-respecting path through c completely determines the target (source a determining target e and source b determining target d). Our uncertainity of two equally likely choices is reduced to one, which corresponds to a betweenness preference of one bit.

In [44]:
print('I^b(S;D) = ', tn.Measures.BetweennessPreference(t2,'c'))
I^b(S;D) =  1.0

We can also output the corresponding (unnormalized) betweenness preference matrix (again see paper for details). The entries provide us with the number of different time-respecting paths through node c. The first row corresponds to node a, the second row corresponds to node b, the first column corresponds to node e, the second column corresponds to node d. Here, the entries reveal that there are two time-respecting paths a -> c -> e (first row) and two time-respecting paths b -> c -> d (second row). Time-respecting paths a -> c -> d and b -> c -> e (off-diagonal zero entries) are absent.

In [45]:
print('I^b(S;D) = ', tn.Measures.__BWPrefMatrix(t2,'c'))
I^b(S;D) =  [[ 2.  0.]
 [ 0.  2.]]

The changes in the statistics of time-respecting paths that are due to the reordering of the time-stamped edges is captured by the fact that the second-order aggregate network is different from the one before (even though the first-order aggregate network is the same!).

In [46]:
g2 = t2.igraphSecondOrder()

visual_style["layout"] = g2.layout_auto()
visual_style["vertex_label"] = g2.vs["name"]
visual_style["edge_label"] = g2.es["weight"]
igraph.plot(g2, 't2_G2.png', **visual_style)
display(Image(filename='t2_G2.png'))

In the second-order network above, we see that now each of the two time-respecting paths (a,c) -> (c,e) and (b,c) -> (c,d)occurs twice, while the theoretically possible time-respecting paths (a,c) -> (c,d) and (b,c) -> (c,e) (which we would actually expect to occur with the same probability than the others) are absent!

You see that this change in the causal structures of the temporal network are exclusively driven by the ordering of time-stamped edges and it is these phenomena that we will now study in more realistic scenarios.

Reading from files: A larger synthetic example

We now consider a larger (synthetic) example which we read from a TEDGE file, a simple file format which contains a list of time-stamped links. TEDGE files have the format

source target time a b 4 b c 7 u v 3 u v 3 ...

The separation character (here: space) can be arbitrary and can be specified upon reading. Furthermore, the ordering of columns can be arbitrary and thus a header line indicating which column refers to source, target and timestamp of an edge is required. As you can see in the example above, time-stamped edges do not necessarily have to be ordered according to time. In addition to simple integer time stamps, actual (arbitrary) string timestamps like 2015-05-23 09:23:22 can be used. In this case a format string indicating the meaning of the timestamp has to be included.

Here, we use a simple example of integer time stamps, reading a file of time-stamped edges that were synthetically generated to include order correlations.

In [47]:
t = tn.TemporalNetwork.readFile('data/example.tedges', sep=' ')
print("Temporal network has", t.vcount(), "nodes")
print("Temporal network has", t.ecount(), "time-stamped edges")
Temporal network has 30 nodes
Temporal network has 13200 time-stamped edges

Let us now extract time-respecting paths of length two and visualize the first-order aggregate network.

In [48]:
t.extractTwoPaths()

g1 = t.igraphFirstOrder()

visual_style = {}
visual_style["bbox"] = (600, 600)
visual_style["margin"] = 60
visual_style["edge_width"] = [x/100 for x in g1.es()["weight"]]
visual_style["vertex_size"] = 20
visual_style["vertex_label_size"] = 12
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_arrow_size"] = 0.3
visual_style["layout"] = g1.layout_auto()

igraph.plot(g1, 'output/example_g1.png', **visual_style)
display(Image(filename='output/example_g1.png'))

Importantly, the weighted time-aggregated network does not show any specific structure. However, due to the specific ordering of time-stamped edges in the temporal network, the second-order aggregate network can look completely different, meaning that only very specific time-respecting paths actually occur. Here, the second-order network shows three pronounced temporal communities that are not visible in the first-order network.

In [49]:
g2 = t.igraphSecondOrder()

visual_style = {}
visual_style["edge_arrow_size"] = 0.01
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 5

igraph.plot(g2, 'output/example_g2.png', **visual_style)
display(Image(filename='output/example_g2n.png'))

Analyzing real-world data: Diffusion dynamics in temporal networks

Let us now come to a more interesting case, analyzing real-world relational data sets from a number of different contexts. The first data set that we will study covers time-stamped antenna-antenna interactions in an ant colony and was originally published in:

  • B Blonder, A Dornhaus: Time-Ordered Networks Reveal Limitations to Information Flow in Ant Colonies, PlosOne, 2011

Here, we read data on one particular colony (details in our papers above) as a temporal network.

In [68]:
t = tn.TemporalNetwork.readFile('data/ants-1-1_agg_6s_scc.tedges', sep=' ')
print("Temporal network has", t.vcount(), "nodes")
print("Temporal network has", t.ecount(), "time-stamped edges")
Temporal network has 61 nodes
Temporal network has 982 time-stamped edges

Let us first consider the (first-order) time-aggregated network.

In [69]:
g1 = t.igraphFirstOrder()

visual_style = {}
visual_style["bbox"] = (600, 600)
visual_style["margin"] = 60
visual_style["vertex_size"] = 30
visual_style["vertex_label_size"] = 8
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_label"] = g1.vs["name"]
visual_style["edge_arrow_size"] = 0.3
visual_style["layout"] = g1.layout_auto()

igraph.plot(g1, 'output/ants_g1.png', **visual_style)
display(Image(filename='output/ants_g1.png'))

Above, we have briefly argued about the time-respecting paths that we would expect, considering that specific order correlations (i.e. betweennness preferences of nodes) are absent in the temporal network. Based on the first-order aggregate network we can actually compute the expected second-order network, under the assumption of a null model in which all interactions are independent from each other. For this, we can use the following pyTempNet function:

In [70]:
g2n = t.igraphSecondOrderNull()

visual_style = {}
visual_style["edge_arrow_size"] = 0.01
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 5

igraph.plot(g2n, 'output/ants_g2n.png', **visual_style)
display(Image(filename='output/ants_g2n.png'))

We see that this expected network is rather densely connected, i.e. we expect many time-respecting paths of length two to exist between nodes. We can now compare this to the actual second-order network.

Here, we see that - due to the specific ordering of time-stamped links, the actual second-order aggregate network can look completely different, meaning that only very specific time-respecting paths actually occur. Here, the second-order network is much less connected than the expected one shown above.

In [71]:
g2 = t.igraphSecondOrder()

visual_style = {}
visual_style["edge_arrow_size"] = 0.01
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 5

igraph.plot(g2, 'output/ants_g2.png', **visual_style)
display(Image(filename='output/ants_g2.png'))

This (visually) different second-order network shown above is a sign for the presence of non-Markovian characteristics which change causality in the temporal network. We can quantify the presence of these characteristics by calculating the entropy growth rate ratio between a first- and a second-order Markov model of the time-stamped edge sequence (see details in publications above). This value captures to what extend the actual temporal network differs from a Markovian case, in which the next edge in a time-stamped edge sequence is independent from the previous one.

Here, we observe a value that is strictly smaller than one, which signifies that the data set indeed included non-Markovian characteristics.

In [72]:
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
Entropy growth rate ratio is 0.423411131818

We can next ask the question to what extent these characteristics influence dynamical processes on the temporal network. Here, we consider a diffusion process, modeled by the convergence behavior of a random walk process on the temporal network. With the next two functions, we can compute the time (i.e. number of random walk steps) required by a random walker to converge. Precisely, we measure the average number of steps after which the total variation distance between the visitation probabilities and the stationary distribution is smaller than a given threshold of epsilon.

In [73]:
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-6)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-6)

print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
Empirical slow-down factor for diffusion is 1.9803149606299213

For small epsilon (i.e. large times t) we empirically observe that non-Markovian characteristics slow down the diffusion process on average by a factor of approx. 2.05 (values will naturally differ for individual runs).

We can actually predict this analytically by means of a spectral analysis of the second-order time-aggregated network. The following function analytically calculates the expected factor by which a diffusion process in a temporal network is slowed down due to non-Markovian characteristics (and the resulting changes in the causality structures of the temporal network).

In [74]:
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
Analytical slow-down factor for diffusion is 2.05252345941

Here we expect diffusion in the temporal network to be slower by a factor of about 2.05 (compared to a Markovian temporal network in which no order correlations change causality). This is well in line with our empirical observation above.

Apart from predicting changes in diffusion speed, we can do some extended spectral analysis of the second-order network based on a normalized Laplacian matrix. This analysis confirms that the actual temporal sequence has a smaller algebraic connectivity than a Markovian temporal network. This indicates that the causal topology of time-respecting paths in the temporal network is less connected than expected at random, thus explaining the observed slow-down of diffusion.

This can intuitively be related to the weaker *causal connectivity& in the real temporal network compared to what we expect at random (compare the second-order networks shown above).

In [75]:
print("Algebraic Connectivity (G2) =", tn.Measures.AlgebraicConn(t))
print("Algebraic Connectivity (G2 null) =", tn.Measures.AlgebraicConn(t, model="NULL"))
Algebraic Connectivity (G2) = 0.0426976073824
Algebraic Connectivity (G2 null) = 0.0856700955263

Let us now move to a second data set, which covers E-Mail exchanges between employees in a manufacturing company. Again, we can read these data asa temporal network.

In [76]:
t = tn.TemporalNetwork.readFile('data/manufacturing_30d_agg_3600_scc.tedges', sep=' ')
print("Temporal network has", t.vcount(), "nodes")
print("Temporal network has", t.ecount(), "time-stamped edges")
Temporal network has 96 nodes
Temporal network has 3843 time-stamped edges

The entropy growth rate ratio smaller than one confirms that the temporal network exhibits non-Markovian characteristics that are likely to change causality.

In [77]:
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
Entropy growth rate ratio is 0.620040571125

Based on spectral properties, we analytically predict these characteristics to slow down diffusion by a factor of about 3.01 (compared to a Markovian temporal network).

In [78]:
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
Analytical slow-down factor for diffusion is 3.01549499081

In []:
Let us empirically confirm that this analytical prediction is correct ...
In [79]:
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-7)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-7)

print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
Empirical slow-down factor for diffusion is 3.124505928853755

Our third data set covers contacts between medical personnel in a hospital, recorded by means of proximity sensing tags. Again, we read these data asa temporal network.

In [80]:
t = tn.TemporalNetwork.readFile('data/Hospital_noADM_agg_300_scc_8_56h.tedges', sep=' ')

print("Temporal network has", t.vcount(), "nodes")
print("Temporal network has", t.ecount(), "time-stamped edges")
Temporal network has 53 nodes
Temporal network has 8571 time-stamped edges

Again, the entropy growth rate ratio smaller than one confirms that the temporal network exhibits non-Markovian characteristics that are likely to change causality.

In [81]:
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
Entropy growth rate ratio is 0.706942987115

Based on spectral properties, we analytically predict these characteristics to slow down diffusion by a factor of about 5.75.

In [82]:
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
Analytical slow-down factor for diffusion is 5.7450563705

Again, we empirically confirm that this prediction is reasonable ...

In [83]:
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-6)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-6)

print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
Empirical slow-down factor for diffusion is 5.555102040816326

We next use the RealityMining data set, covering proximity interactions between students at MIT. We read it as a temporal network.

In [84]:
t = tn.TemporalNetwork.readFile('data/RealityMining_agg_300s_scc.tedges', sep=' ')
print("Temporal network has", t.vcount(), "nodes")
print("Temporal network has", t.ecount(), "time-stamped edges")
Temporal network has 58 nodes
Temporal network has 24984 time-stamped edges

And again, the temporal sequence deviates from a Markovian temporal network and non-Markovian characteristics are expected to slow down diffusion by a factor of about 7.77, which we confirm empirically.

In [85]:
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
Entropy growth rate ratio is 0.395843615116

In [86]:
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
Analytical slow-down factor for diffusion is 7.77413007766

In [87]:
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-6)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-6)

print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
Empirical slow-down factor for diffusion is 8.07273163706193

Finally, we also find examples for temporal networks in which non-Markovian characteristics result in a speed-up. For this, we consider a data set of time-stamped passenger flows in the London Tube network.

In [9]:
t = tn.TemporalNetwork.readFile('data/tube_flows_scc.tedges', sep=' ')
print("Temporal network has", t.vcount(), "nodes")
print("Temporal network has", t.ecount(), "time-stamped edges")
Temporal network has 132 nodes
Temporal network has 225372 time-stamped edges

Interestingly, here non-Markovian characteristics result in a speed-up of diffusion expressed by a slow-down factor smaller than one.

In [89]:
print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t))
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t))
Entropy growth rate ratio is 0.303866969488
Analytical slow-down factor for diffusion is 0.230196514165

This sounds counter-intuitive, the mere reordering of time-stamped links results in a speed up. Let us validate this by means of a simulation.

In [90]:
speed_g2 = tn.Processes.RWDiffusion(t.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-6)
speed_g2n = tn.Processes.RWDiffusion(t.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-6)
print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
Empirical slow-down factor for diffusion is 0.2644334975369458

That's interesting, but what exactly is happening there? We can get a better idea about this, by actually looking at a visualization of a diffusion process. For this purpose, pyTempNet comes with a function that allows to conveniently generate a video of the diffusion in a temporal network. The idea is that the process will be visualized in the first-order time-aggregated network, while the diffusion dynamics can either follow a second-order or a first-order Markov model. By this, we are able to visualize the effect of order correlations on the dynamical process.

In [11]:
g1 = t.igraphFirstOrder()

visual_style = {}
visual_style["edge_arrow_size"] = 0.4
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 10
visual_style["edge_width"] = [np.log(x)/4 for x in g1.es()["weight"]]
visual_style["layout"] = g1.layout_auto()

tn.exportDiffusionMovieFramesFirstOrder(t, 'frames/LT_diffusion_t2', visual_style, steps = 200, initial_index=0, model='SECOND')
Step 0  TVD = 0.998994875455
Step 10  TVD = 0.850402382077
Step 20  TVD = 0.423183750363
Step 30  TVD = 0.218607280454
Step 40  TVD = 0.128935555821
Step 50  TVD = 0.067007173767
Step 60  TVD = 0.0408314260111
Step 70  TVD = 0.021756091271
Step 80  TVD = 0.011043940819
Step 90  TVD = 0.00820040053235
Step 100  TVD = 0.00427294898294
Step 110  TVD = 0.00273305050069
Step 120  TVD = 0.00177101693373
Step 130  TVD = 0.000852162944137
Step 140  TVD = 0.000640758925641
Step 150  TVD = 0.000377945592419
Step 160  TVD = 0.000195032282883
Step 170  TVD = 0.000155387519435
Step 180  TVD = 7.18137801293e-05
Step 190  TVD = 4.85421284149e-05

The function will output the evolution of total variation distance on the console, so that you can check whether the chosen number of steps was sufficient for the diffusion process to converge to a reasonable degree. The function also generates video frames, which we can again convert into a video using imagemagick's convert tool as follows:

> convert -delay 10 frames/LT_diffusion_t2_frame* output/LT_t2.mp4

We can then again embed the resulting mp4-file directly into the notebook. The resulting video shows how a diffusion process evolves in the temporal network based on the actual two-path statistics in the data.

In [13]:
showVideo('output/LT_t2.mp4')
Out[13]:

In the video, we observe a fast diffusion through the network. Let us now compare this to a diffusion process based on the first-order time-aggregated network, i.e. based on the two-path statistics that we would expect.

In [14]:
tn.exportDiffusionMovieFramesFirstOrder(t, 'frames/LT_diffusion_t1', visual_style, steps = 200, initial_index=0, model='NULL')
Step 0  TVD = 0.998994875455
Step 10  TVD = 0.66652125666
Step 20  TVD = 0.468552901908
Step 30  TVD = 0.345080815329
Step 40  TVD = 0.275234216273
Step 50  TVD = 0.230621363575
Step 60  TVD = 0.197581123249
Step 70  TVD = 0.172050885691
Step 80  TVD = 0.151360475019
Step 90  TVD = 0.134013815265
Step 100  TVD = 0.119109383406
Step 110  TVD = 0.10609152886
Step 120  TVD = 0.0946078779571
Step 130  TVD = 0.0844203096597
Step 140  TVD = 0.0753549257781
Step 150  TVD = 0.0672753063726
Step 160  TVD = 0.0600685028221
Step 170  TVD = 0.0536376790085
Step 180  TVD = 0.0478981231113
Step 190  TVD = 0.0427749373529

Again, we use the generated frames to generate an mp4-video by calling on the console: > convert -delay 10 frames/LT_diffusion_t1_frame* output/LT_t1.mp4

In [17]:
showVideo('output/LT_t1.mp4')
Out[17]:

That's interesting! We observe that the diffusion process based on a first-order model, in which two-path statistics correspond to what we expect based on a weighted time-aggregated network, is indeed slower than in the second-order model. We also observe why this is the case: The numerous loops in the network result in a slow diffusion, while the actual two-path statistics based on the data set on passenger flows result in a much faster diffusion, which quickly reaches all nodes in the network.

In fact, this is not surprising as people do not travel the London Tube in loops. And it is this temporal characteristics which we miss if we merely study time-aggregated networks, and which we can include in our analysis by means of a second-order representation.

Spectral analysis: demonstration using TRIGRAM files

Let us now come to the last part of our tutorial, introducing some basic support for a spectral analysis of temporal networks. Here, rather than using a TEDGE file containing time-stamped edges from which we extract time-respecting paths of length two, we use so-called TRIGRAM files. These files directly contain the statistics of time-respecting paths of length two. They are useful whenever direct data on paths or flows in dynamic networks is available.

The format of a TRIGRAM file is the following:

node1 node2 node3 weight a b c 3.0 u v w 4.0 ...

In the example above, we have a time-respecting path (a,b) -> (b,c) occuring three times, while we have a time-respecting path (u,v) -> (v,w) occurring four times.

Based on this format, in the following we use synthetic TRIGRAM data generated from a model which
shows how non-Markovian characteristics in temporal networks can both slow down and speed up dynamical processes.

Similar like we have seen above, the first file corresponds to a case where non-Markovian properties speed up a diffusion process.

In [91]:
t_su = tn.TemporalNetwork.readFile('data/sigma0_75.trigram', fformat='TRIGRAM', sep = ' ')

print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t_su))
Entropy growth rate ratio is 0.996687767988

Again, the entropy growth rate ratio is smaller than one, verifying that the temporal network has non-Markovian characteristics. We can test the effect of these non-Markovian characteristics by computing the analytical slow-down factor. Here it is smaller than one, verifying a speed-up of diffusion dynamics.

In [92]:
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t_su))
Analytical slow-down factor for diffusion is 0.688363359047

Let us have a closer look at this temporal network. The first-order aggregate network consists of two communities.

In [93]:
g1 = t_su.igraphFirstOrder()

visual_style = {}
visual_style["edge_width"] = [np.sqrt(x) for x in g1.es()["weight"]]
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 10
visual_style["edge_arrow_size"] = 0.5
visual_style["layout"] = g1.layout_auto()

igraph.plot(g1, 'output/model_0_75_g1.png', **visual_style)
display(Image(filename='output/model_0_75_g1.png'))

These two communities are visible in the expected second-order network as well

In [94]:
g2n = t_su.igraphSecondOrderNull()

visual_style = {}
max_weight = max(g2.es()["weight"])
visual_style["edge_width"] = [200*x for x in g2n.es()["weight"]]
visual_style["edge_arrow_size"] = 0.01
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 5
visual_style["layout"] = g2n.layout_auto()

igraph.plot(g2n, 'output/model_0_75_g2n.png', **visual_style)
display(Image(filename='output/model_0_75_g2n.png'))

In the actual second-order time-aggregated network, we observe that communities are (temporally) more connected than expected at random, thus explaining the speed-up.

In [95]:
g2 = t_su.igraphSecondOrder()

max_weight = max(g2.es()["weight"])
visual_style["edge_width"] = [x/max_weight for x in g2.es()["weight"]]
visual_style["edge_arrow_size"] = 0.01
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 5

igraph.plot(g2, 'output/model_0_75_g2.png', **visual_style)
display(Image(filename='output/model_0_75_g2.png'))

We can gain some more (analytical) insights about this by means of a spectral analysis of the higher-order networks. We can first calculate algebraic connectivity in the second-order network. A comparison with the algebraic connectivity of the (expected) Markovian temporal network shows that the temporal network is indeed better connected than expected.

In [96]:
print("Algebraic Connectivity (G2) =", tn.Measures.AlgebraicConn(t_su))
print("Algebraic Connectivity (G2 null) =", tn.Measures.AlgebraicConn(t_su, model="NULL"))
Algebraic Connectivity (G2) = 0.0147085987469
Algebraic Connectivity (G2 null) = 0.0101482158018

We can get a better perspective on this by studying the so-called Fiedler vector, i.e. the eigenvector corresponding to the second-smallest eigenvalue of the Laplacian matrix, corresponding to the higher-order aggregate network. This vector can be computed with pyTempNet right away. We can then plot the entries of this vector correponding to individual nodes in the second-order network (i.e. links in the first-order network).

In [97]:
fiedler = tn.Measures.FiedlerVector(t_su)

plt.clf()
plt.tick_params(axis='both', which='major', labelsize=20)
plt.tick_params(axis='both', which='minor', labelsize=20)       
plt.xlabel('$i$', fontsize=30)
plt.ylabel(r'$(v_2)_i$', fontsize=30)
plt.subplots_adjust(bottom=0.25)
plt.subplots_adjust(left=0.25)
plt.scatter(range(len(fiedler)), np.real(fiedler), color="r")
plt.savefig("output/model_0_75_fiedler.png")
plt.close()

display(Image(filename='output/model_0_75_fiedler.png'))

We see two clear ranges of values below and above zero which are a sign of two pronounced communities. At the same time, the points connecting these two ranges show that a number of nodes are in between these communities. Interestingly, if we compare the Fiedler vector to that of the expected second-order network, we see that community structures in the real temporal network are not as strong as expected at random, i.e. in this case the non-Markovian properties of the temporal network actually mitigate community structures!

In [98]:
fiedler = tn.Measures.FiedlerVector(t_su, model="NULL")

plt.clf()
plt.tick_params(axis='both', which='major', labelsize=20)
plt.tick_params(axis='both', which='minor', labelsize=20)       
plt.xlabel('$i$', fontsize=30)
plt.ylabel(r'$(v_2)_i$', fontsize=30)
plt.subplots_adjust(bottom=0.25)
plt.subplots_adjust(left=0.25)
plt.scatter(range(len(fiedler)), np.real(fiedler), color="r")
plt.savefig("output/model_fiedler_null.png")
plt.close()

display(Image(filename='output/model_fiedler_null.png'))

We now consider a second example, a temporal network which - although it is consistent with the same time-aggregated network shown above, is generated in such a way that non-Markovian characteristics actually slow down diffusion. We can again read the resulting TRIGRAM file, compute the entropy growth rate ration and the analytical slow-down factor.

In [99]:
t_sd = tn.TemporalNetwork.readFile('data/sigma-0_75.trigram', fformat='TRIGRAM', sep = ' ')

print("Entropy growth rate ratio is", tn.Measures.EntropyGrowthRateRatio(t_sd))
Entropy growth rate ratio is 0.991543696969

In [100]:
print("Analytical slow-down factor for diffusion is", tn.Measures.SlowDownFactor(t_sd))
Analytical slow-down factor for diffusion is 5.03547933646

Here, our analysis shows that diffusion is expected to be slower by a factor of five, which we can again verify using simulations.

In [101]:
speed_g2 = tn.Processes.RWDiffusion(t_sd.igraphSecondOrder().components(mode="strong").giant(), epsilon=1e-6)
speed_g2n = tn.Processes.RWDiffusion(t_sd.igraphSecondOrderNull().components(mode="strong").giant(), epsilon=1e-6)

print("Empirical slow-down factor for diffusion is", speed_g2/speed_g2n)
Empirical slow-down factor for diffusion is 5.030731025919604

A spectral analysis shows that the algebraic connectivity in the second-order network is much smaller than before, which is due to the fact that in this particular example non-Markovian characteristics enforce the community structures. Compared to the algebraic connectivity of the expected Markovian temporal network, here non-Markovian characteristics result in a less connected causal topology, thus explaining the slow-down.

In [102]:
print("Algebraic Connectivity (G2) =", tn.Measures.AlgebraicConn(t_sd))
Algebraic Connectivity (G2) = 0.00202358811696

This can also be seen in the distribution of entries of the Fiedler vector. The two value ranges are much more separated, which means that community structures are stronger than before. Furthermore, they are stronger than in the null model.

In [107]:
fiedler = tn.Measures.FiedlerVector(t_sd)

plt.clf()
plt.tick_params(axis='both', which='major', labelsize=20)
plt.tick_params(axis='both', which='minor', labelsize=20)       
plt.xlabel('$i$', fontsize=30)
plt.ylabel(r'$(v_2)_i$', fontsize=30)
plt.subplots_adjust(bottom=0.25)
plt.subplots_adjust(left=0.25)
plt.scatter(range(len(fiedler)), np.real(fiedler), color="r")
plt.savefig("output/model_-0_75_fiedler.png")
plt.close()

display(Image(filename='output/model_-0_75_fiedler.png'))

Notably, both the first-order aggregate network, as well as the expected second-order network (both shown below) are exactly the same as before (since the only difference between the temporal networks is the ordering of time-stamped edges).

In [106]:
g1 = t_sd.igraphFirstOrder()

visual_style = {}
visual_style["edge_width"] = [np.sqrt(x) for x in g1.es()["weight"]]
visual_style["vertex_color"] = "lightblue"
visual_style["vertex_size"] = 10
visual_style["edge_arrow_size"] = 0.5
visual_style["layout"] = g1.layout_auto()

igraph.plot(g1, 'output/model_-0_75_g1.png', **visual_style)
display(Image(filename='output/model_-0_75_g1.png'))
In [104]:
g2n = t_sd.igraphSecondOrderNull()

visual_style["edge_width"] = [200*x for x in g2n.es()["weight"]]
visual_style["edge_arrow_size"] = 0.01
visual_style["vertex_size"] = 5

igraph.plot(g2n, 'output/model_-0_75_g2n.png', **visual_style)
display(Image(filename='output/model_-0_75_g2n.png'))

However, the actual second-order aggregate network is different. Here we observe that compared to the second-order network above, non-Markovian characteristics enforce community structures.

In [105]:
g2 = t_sd.igraphSecondOrder()

visual_style["edge_width"] = [x for x in g2.es()["weight"]]
visual_style["vertex_size"] = 5
visual_style["edge_arrow_size"] = 0.01

igraph.plot(g2, 'output/model_-0_75_g2.png', **visual_style)
display(Image(filename='output/model_-0_75_g2.png'))

I hope you enjoyed this tutorial and I hope you are now excited to apply the methods outlined above to your own time-stamped relational data. As put forth in our recent paper, we believe that the higher-order networks perspective provides interesting opportunities for novel data mining methods, new measures for centrality in temporal networks, and the development of novel community detection algorithms that incorporate both topological and temporal characteristics of time-varying complex system.

Feel free to get in touch with me if you have further questions.

Ingo Scholtes
ischoltes@ethz.ch

Zurich, May 2015